Implement minimal GPU culling for cameras. #12673
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This commit introduces a new component,
GpuCulling
, which, when present on a camera, skips the CPU visibility check in favor of doing the frustum culling on the GPU. This trades off potentially-increased CPU work and drawcalls in favor of cheaper culling and doesn't improve the performance of any workloads that I know of today. However, it opens the door to significant optimizations in the future by taking the necessary first step toward GPU-driven rendering.Enabling GPU culling for a view puts the rendering for that view into indirect mode. In indirect mode, CPU-level visibility checks are skipped, and all renderable entities are considered potentially visible. Bevy's batching logic still runs as usual, but it doesn't directly generate mesh instance indices. Instead, it generates instance handles, which are indices into an array of real instance indices. Before any rendering is done, for each view, a compute shader,
cull.wgsl
, maps instance handles to instance indices, discarding any instance handles that represent meshes that are outside the visible frustum. Draws are then done using the indirect draw feature ofwgpu
, which instructs the GPU to read the number of actual instances from the output of that compute shader.Essentially, GPU culling works by adding a new level of indirection between the CPU's notion of instances (known as instance handles) and the GPU's notion of instances.
A new
--gpu-culling
flag has been added to themany_foxes
,many_cubes
, and3d_shapes
examples.Potential follow-ups include:
Split up
RenderMeshInstances
into CPU-driven and GPU-driven parts. The former, which contain fields like the transform, won't be initialized at all in when GPU culling is enabled. Instead, the transform will be directly written to the GPU inextract_meshes
, likeextract_skins
does for joint matrices.Implement GPU culling for shadow maps.
Retain bins from frame to frame so that they don't have to be rebuilt. This is a longer term project that will build on top of Improve performance by binning together opaque items instead of sorting them. #12453 and several of the tasks in Renderer optimization tracking issue #12590, such as main-world pipeline specialization.
Implement two-phase occlusion culling on top of the new indirect mode. This allows us to move beyond simple frustum culling.
This PR needs a bit more polish before it's ready to go, so I'm marking it as a draft. Everything seems to work though.